Exploration and Reduction of the Feature Space by Hierarchical Clustering
نویسندگان
چکیده
In this paper we propose and test the use of hierarchical clustering for feature selection. The clustering method is Ward’s with a distance measure based on GoodmanKruskal tau. We motivate the choice of this measure and compare it with other ones. Our hierarchical clustering is applied to over 40 data-sets from UCI archive. The proposed approach is interesting from many viewpoints. First, it produces the feature subsets dendrogram which serves as a valuable tool to study relevance relationships among features. Secondarily, the dendrogram is used in a feature selection algorithm to select the best features by a wrapper method. Experiments were run with three different families of classifiers: Naive Bayes, decision trees and k nearest neighbours. Our method allows all the three classifiers to generally outperform their corresponding ones without feature selection. We compare our feature selection with other state-of-the-art methods, obtaining on average a better classification accuracy, though obtaining a lower reduction in the number of features. Moreover, differently from other approaches for feature selection, our method does not require any parameter tuning.
منابع مشابه
Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy
Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...
متن کاملیک روش مبتنی بر خوشهبندی سلسلهمراتبی تقسیمکننده جهت شاخصگذاری اطلاعات تصویری
It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...
متن کاملA Nonlinear Dimensionality Reduction Using Combined Approach to Feature Space Decomposition
In this paper we propose a new combined approach to feature space decomposition to improve the efficiency of the nonlinear dimensionality reduction method. The approach performs the decomposition of the original multidimensional space, taking into account the configuration of objects in the target low-dimensional space. The proposed approach is compared to the approach using hierarchical cluste...
متن کاملروش نوین خوشهبندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی
Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...
متن کاملLinking Multi-dimensional Feature Space Cluster Visualization to Surface Extraction from Multi-field Volume Data
Data sets resulting from physical simulations typically contain a multitude of physical variables. It is, therefore, desirable that visualization methods take into account the entire multi-field volume data rather than concentrating on one variable. We present a visualization approach based on surface extraction from multi-field volume data. The extracted surfaces segment the data with respect ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008